Accessing Data on SGI Altix: An Experience with Reality

نویسندگان

  • Guido Juckeland
  • Matthias S. Müller
  • Wolfgang E. Nagel
  • Stefan Pflüger
چکیده

The SGI Altix system architecture allows to support very large ccNUMA shared memory systems. Nevertheless, the system layout sets boundaries to the sustained memory performance which can only be avoided by selecting the “right” data access strategies. The paper presents the results of cache and memory performance studies on SGI Altix 350. It demonstrates limitations and benefits of the system and the Intel Itanium 2 processor underneath.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Performance of the SGI Altix 4700 via Scientific Benchmark and Micro-Benchmarks

I evaluated the performance of the SGI Altix 4700 by using several well-known benchmarks. In performing these experiments we hope to gain a better understanding of the capabilities and limitations of the system, and thus be able improve upon the design in future generations or develop tools that enhance the performance of the system.

متن کامل

Optimizing OpenMP Parallelized DGEMM Calls on SGI Altix 3700

Using functions of parallelized mathematical libraries is a common way to accelerate numerical applications. Computer architectures with shared memory characteristics support different approaches for the implementation of such libraries, usually OpenMP or MPI. This paper’s content is based on the performance comparison of DGEMM calls (floating point matrix multiplication, double precision) with...

متن کامل

High Performance FFT on SGI Altix 3700

We have developed a high-performance FFT on SGI Altix 3700, improving the efficiency of the floating-point operations required to compute FFT by using a kind of loop fusion technique. As a result, we achieved a performance of 4.94 Gflops at 1-D FFT of length 4096 with an Itanium 2 1.3 GHz (95% of peak), and a performance of 28 Gflops at 2-D FFT of 4096 with 32 processors. Our FFT kernel outperf...

متن کامل

High performance computing using MPI and OpenMP on multi-core parallel systems

The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems—distributed memory across nodes and shared memory with non-uniform memory access within each node—poses a challenge to application developers. In this paper, we study a hybrid approach to programm...

متن کامل

Revealing the Performance of MPI RMA Implementations

The MPI remote-memory access (RMA) operations provide a different programming model from the regular MPI-1 point-to-point operations. This model is particularly appropriate for cases where there are multiple communication events for each synchronization and where the target memory locations are known by the source processes. In this paper, we describe a benchmark designed to illustrate the perf...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006